Remark
Please be aware that these lecture notes are accessible online in an ‘early access’ format. They are actively being developed, and certain sections will be further enriched to provide a comprehensive understanding of the subject matter.
2.3. Understanding Geospatial Data#
Geospatial data, a key component of spatial analysis and geographic sciences, refers to information that is linked directly or indirectly to a specific location or geographical area. This section delves into the characteristics of geospatial data, its various forms, and its significance in spatial analysis.
The spatial dimension is a critical aspect of data analysis, providing a geographical perspective that transforms raw data into insightful visualizations. It enables the detection of patterns, such as migration flows, urban development, and environmental shifts. Spatial analysis also reveals relationships between data points, highlighting proximity and clustering effects. This analysis is crucial for resource allocation, urban planning, and emergency management, offering clear insights for informed decision-making. Furthermore, visualizing spatial data supports strategic policy development and effective governance, ensuring that decisions are grounded in tangible, spatially-referenced evidence.
2.3.1. Types of Geospatial Data#
In the context of a Geographic Information System (GIS), real-world observations—which include any objects or events that are measurable in two or three dimensions—must be translated into simplified spatial representations. This process involves distilling complex, real-world details into fundamental spatial constructs that can be effectively managed and analyzed within a GIS framework. These constructs are then modeled in one of two ways:
Vector Data Model: This approach captures the geometry and location of spatial entities using points, lines, and polygons. It is adept at representing discrete features with clear boundaries and precise locations, such as buildings, roads, or administrative borders.
Fig. 2.1 An example of vector data.#
Raster Data Model: In this model, the spatial entities are depicted as a uniform grid of cells, with each cell holding a value to represent a particular attribute of that area, such as elevation or temperature. It is suited for continuous data that doesn’t have distinct boundaries, like rainfall distribution or land surface temperatures.
Fig. 2.2 An example of raster data.#
Both are fundamental to GIS and are chosen based on the nature of the data and the specific requirements of the analysis being performed.
2.3.1.1. Vector Data#
Vector data is a way of representing real-world features within the context of spatial analysis and geographic information systems (GIS). Here’s a breakdown of its components:
Points: The most basic form of vector data, points are used to represent discrete locations on the earth’s surface. Each point is defined by a pair of coordinates (latitude and longitude) and can symbolize locations like cities, wells, or trees.
Example: The dataset from the Calgary Public Library contains information about library locations and their hours of operation. Here we only represent libraries and their locations.
Note - Map Scale
The scale in a GIS context is the ratio of a distance on the map to the actual distance on the ground. A large-scale map shows a larger ratio, meaning that map features are relatively large. This type of map covers a smaller area but with greater detail. For instance, a scale might be represented as 1:5,000 where 1 unit on the map equals 5,000 units in reality.
On these Folium maps, the scale is indicated in both kilometers and miles for convenience, such as 5 km or 5 mi, aiding in quick estimation of distances.
Lines: Lines, or polylines, are sequences of points connected by straight segments that represent linear features such as rivers, roads, or utility lines. They are crucial for mapping routes and connections between different points.
Example: The following map displays the LRT tracks for the city of Calgary.
Polygons: Polygons are closed shapes formed by connecting multiple line segments end-to-end. They are used to represent areas like lakes, park boundaries, or property lots. Polygons can be complex, with attributes like area, perimeter, and centroid.
Example: The following data is a representation of the City of Calgary’s boundary in a MULTIPOLYGON format.
Vector data is particularly valuable in applications that require high precision and detail. For example:
Cadastral Mapping: This involves creating maps that show property boundaries and land ownership. Precision is key here, as legal implications are involved.
Navigation Systems: GPS and other navigation tools use vector data to provide accurate turn-by-turn directions and route planning.
2.3.1.2. Raster Data#
Raster data is a type of geospatial data representation that uses a matrix of cells, commonly referred to as pixels, to model the Earth’s surface and various phenomena. This method is particularly effective for capturing and conveying information that changes continuously over space, such as elevation, temperature, or land cover.
Grid of Pixels: Imagine a raster as a digital canvas where each pixel is a square paint dab. Each dab (pixel) carries specific information about that tiny square of the real world.
Pixel Values: The value of each pixel can represent different types of data. For example, in a temperature map, the pixel value might indicate the temperature at that location; in a digital elevation model, it would represent the height above sea level.
Example: Imagine a 2D array filled with integers ranging from 0 to 100. By applying a colormap to a 2D plot of this array, we can create a visual representation. This method is akin to how we visualize diverse datasets, including elevation and land surface temperatures, to extract meaningful patterns from numerical values.
Show code cell source
import numpy as np
import matplotlib.pyplot as plt
# Create a gradient 10x10 array with values scaling from 0 to 100
X = np.linspace(0, 100, num=100).reshape((10, 10))
# Create the figure and axis objects with a specified size
fig, ax = plt.subplots(figsize=(6, 6))
# Display the array as an image with the 'Spectral_r' colormap
im = ax.imshow(X, cmap='Spectral_r', interpolation='nearest')
# Set the aspect ratio of the axis to be equal
ax.set_aspect('equal')
# Add a colorbar to the figure with specified fraction and padding
cbar = fig.colorbar(im, ax=ax, fraction=0.046, pad=0.04)
cbar.set_label('Value', rotation=270, labelpad=20, fontsize=16)
cbar.ax.tick_params(labelsize=16)
# Set the ticks to be at the borders of the cells
ax.set_xticks(np.arange(-.5, 10, 1), minor=True)
ax.set_yticks(np.arange(-.5, 10, 1), minor=True)
# Remove the x-tick labels
ax.set_xticklabels([])
ax.set_yticklabels([])
# Enable the grid lines and set the style
ax.grid(False)
ax.grid(True, which='minor', color='white', linestyle='-', linewidth=2)
# Adjust layout to ensure everything fits without overlap
plt.tight_layout()
Some common uses of raster data include:
Satellite Imagery: These images are composed of raster data where each pixel corresponds to a specific area on the Earth’s surface, capturing details like land cover, land surface temperature, etc.
Elevation Models: Digital Elevation Models (DEMs) use raster data to represent the terrain. Each pixel’s value indicates the elevation at that specific point, which is essential for flood modeling, land use planning, and even 3D visualization.
Example: The goal of this example aligns with the principles discussed in the previous example about the spatial dimension. Just as we use spatial analysis to visualize and understand complex datasets, the code demonstrates this process in action. It uses Earth Engine and geemap to create visual representations of elevation and water occurrence, similar to how we might visualize land surface temperatures or other environmental data. The example underscores the power of geospatial tools to transform numerical data into comprehensible, visual formats, aiding in the analysis and decision-making processes that were highlighted earlier. Essentially, it’s a practical application of the spatial dimension’s capabilities in real-world scenarios.
Fig. 2.3 visualizes two key layers of environmental data:
Elevation Layer: This layer is derived from the Shuttle Radar Topography Mission (SRTM) Digital Elevation Model (DEM) Version 4. It represents the Earth’s surface elevation with a color palette ranging from 0 to 4000 meters. The color gradient, defined in the visualization parameters, likely ranges from lighter colors at lower elevations to darker colors at higher elevations, providing a visual representation of the terrain.
Water Occurrence Layer: This layer comes from the Joint Research Centre (JRC) Global Surface Water dataset. It shows the occurrence of water bodies, such as streams and lakes, within the map area. Water features are highlighted in blue, and the layer is masked to display only areas where water occurs, enhancing the contrast against the land surfaces.
The map includes a color bar for the elevation layer, labeled with “Elevation (m)” to indicate the altitude in meters above sea level. The map is interactive, allowing users to zoom in and out and explore different regions within the vicinity of Calgary. The combination of these layers provides valuable insights into the topography and hydrography of the region, which can be useful for environmental studies, urban planning, and recreational activities.
Fig. 2.3 This figure presents a detailed visualization of Calgary’s topography and hydrography. The elevation data, sourced from the SRTM DEM Version 4, is color-coded to represent varying altitudes, providing a clear depiction of the terrain’s undulations. Overlaying this is the JRC Global Surface Water layer, which highlights the city’s water bodies in blue, distinguishing them from the surrounding land. Together, these layers offer a comprehensive view of Calgary’s natural landscape, essential for environmental analysis and urban planning.#
2.3.2. Vector vs. Raster Data Models#
Vector and raster data models are fundamental in GIS for representing spatial information. Each has unique characteristics that make them suitable for different types of spatial analysis [GISGeography, 2015, Atlas, 2024].
2.3.2.1. Vector Data Advantages#
Vector data’s structure allows for complex analyses and representations of the real world, from the precision of property lines to the connectivity of road networks.
Precision and Accuracy: Vector data can represent boundaries and features with a high degree of accuracy, which is essential for detailed mapping and analysis.
Scalability: Unlike raster data, vector data can be scaled up or down without losing quality. This makes vector data ideal for applications that require zooming in and out.
Efficient Storage: For many types of geographical data, vector formats require less storage space, especially when representing sparse data.
Topology: Vector data helps to describe the entire topology, allowing for the representation of not just the location but also the relationships between different spatial features.
2.3.2.2. Vector Data Disadvantages#
Despite its advantages, vector data can present challenges, particularly when dealing with large datasets or complex spatial relationships.
Complex Data Structure: Vector data can be complex to manage due to the relationships between points, lines, and polygons. This complexity can make data management and analysis more challenging.
Computational Intensity: Certain spatial operations on vector data, such as network analysis or overlay analysis, can be computationally intensive and time-consuming.
2.3.2.3. Raster Data Advantages#
Raster data’s simplicity and suitability for certain types of analysis make it a valuable tool in the GIS toolkit, especially when dealing with large, continuous datasets.
Simplicity: Raster data is conceptually simpler and easier to work with, making it accessible to a wide range of users.
Suitable for Continuous Data: Raster is ideal for representing continuous data, such as elevation or temperature gradients, where the phenomenon is measured across the landscape.
Fast Analysis: For certain types of spatial analysis, raster data can be processed quickly due to its regular grid structure.
2.3.2.4. Raster Data Disadvantages#
Raster data’s reliance on resolution can be a limiting factor, affecting everything from the accuracy of feature representation to the size of the data files.
Resolution Dependency: The level of detail in raster data is tied to the resolution of the pixels. Higher resolution means more detail but also larger file sizes.
Spatial Inaccuracies: The limits imposed by raster cell dimensions can lead to spatial inaccuracies, especially when representing small or narrow features.
Both vector and raster data models have their place in GIS and are often used complementarily. Vector data is typically used for precise mapping and detailed analysis of discrete features, while raster data is used for modeling and analyzing continuous phenomena. The choice between vector and raster data depends on the specific requirements of the project, the nature of the spatial data, and the type of analysis to be performed.
2.3.3. Attribute Tables#
In Geographic Information Systems (GIS), attribute tables are essential components that store non-spatial data linked to spatial features. Each spatial feature on a map, such as a building, road, or land parcel, corresponds to a record in the attribute table. This record is connected to the feature through a unique numerical identifier known as a Feature Identifier (FID). For example, a park (spatial feature) on a GIS map may have an FID of 102, and its corresponding record in the attribute table could include attributes like area, vegetation type, and usage regulations.
Example: Let’s take a look at the attribute tables for the dataset from the Calgary Community Boundaries. The example shows a snippet of an attribute table for the Calgary Community Boundaries dataset. It illustrates how each spatial feature, like a park or residential area, is associated with a record in the table, identified by a unique Feature Identifier (FID). The table includes various attributes such as class, class code, community code, name, sector, and more, which describe the non-spatial characteristics of the spatial features. The purpose is to show how GIS integrates spatial data (like MULTIPOLYGON geometries) with descriptive information, enabling detailed analysis and decision-making. It highlights the importance of attribute tables in managing and utilizing geospatial data effectively.
| CLASS | CLASS_CODE | COMM_CODE | NAME | SECTOR | SRG | COMM_STRUCTURE | CREATED_DT | MODIFIED_DT | MULTIPOLYGON | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Residential | 1 | LEB | LEWISBURG | NORTH | DEVELOPING | BUILDING OUT | 2016/12/21 | 2019/11/26 | MULTIPOLYGON (((-114.0480237 51.1749865, -114.... | MULTIPOLYGON (((-114.0480237 51.1749865, -114.... |
| 1 | Residential | 1 | CSC | CITYSCAPE | NORTHEAST | DEVELOPING | BUILDING OUT | 2016/12/21 | 2016/12/21 | MULTIPOLYGON (((-113.9524996 51.1543075, -113.... | MULTIPOLYGON (((-113.9524996 51.1543075, -113.... |
| 2 | Industrial | 2 | ST1 | STONEY 1 | NORTH | N/A | EMPLOYMENT | 2016/12/21 | 2016/12/21 | MULTIPOLYGON (((-114.0133015 51.1744266, -114.... | MULTIPOLYGON (((-114.0133015 51.1744266, -114.... |
| 3 | Residential | 1 | MRT | MARTINDALE | NORTHEAST | ESTABLISHED | 1980s/1990s | 2016/12/21 | 2020/10/22 | MULTIPOLYGON (((-113.9648991 51.1251901, -113.... | MULTIPOLYGON (((-113.9648991 51.1251901, -113.... |
| 4 | Industrial | 2 | ST2 | STONEY 2 | NORTHEAST | N/A | EMPLOYMENT | 2016/12/21 | 2016/12/21 | MULTIPOLYGON (((-113.9939281 51.153327, -113.9... | MULTIPOLYGON (((-113.9939281 51.153327, -113.9... |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 308 | Residential | 1 | DRN | DEER RUN | SOUTH | ESTABLISHED | 1980s/1990s | 2016/12/21 | 2024/04/15 | MULTIPOLYGON (((-114.0118593 50.9381207, -114.... | MULTIPOLYGON (((-114.0118593 50.9381207, -114.... |
| 309 | Major Park | 3 | FPK | FISH CREEK PARK | PARKS | 2024/04/02 | 2024/04/15 | MULTIPOLYGON (((-114.1109815 50.9214266, -114.... | MULTIPOLYGON (((-114.1109815 50.9214266, -114.... | ||
| 310 | Residual Sub Area | 4 | 02L | 02L | OTHER | 2016/12/21 | 2024/05/13 | MULTIPOLYGON (((-114.0945798 51.2123357, -114.... | MULTIPOLYGON (((-114.0945798 51.2123357, -114.... | ||
| 311 | Residential | 1 | ABR | AMBLERIDGE | NORTH | DEVELOPING | BUILDING OUT | 2024/05/13 | 2024/05/13 | MULTIPOLYGON (((-114.1295323 51.1977901, -114.... | MULTIPOLYGON (((-114.1295323 51.1977901, -114.... |
| 312 | Residential | 1 | GLR | GLACIER RIDGE | NORTH | DEVELOPING | BUILDING OUT | 2020/06/01 | 2024/05/13 | MULTIPOLYGON (((-114.1679438 51.196922, -114.1... | MULTIPOLYGON (((-114.1679438 51.196922, -114.1... |
313 rows × 11 columns
2.3.4. Raster Data Attributes#
Raster data play a crucial role in Geographic Information Systems (GIS), where they serve as a fundamental means of representing spatial information through the values assigned to each pixel. These pixels are not just mere placeholders of spatial data; they can be categorized using unique integer values, which allows them to be linked to a set of attributes. This categorization is especially significant in land cover datasets, where different environmental features such as water bodies, forests, and urban areas are denoted by these pixel values. Each category is meticulously described in an attribute table, which includes detailed characteristics like the quality of water, the density of forests, or the regulations governing urban zones.
Example - Visualizing Raster Data with Heatmaps: To illustrate this concept, imagine a 10x10 matrix that represents a raster. This hypothetical raster data can be depicted through a heatmap, where a colorbar indicates the value of each cell by assigning specific colors. This technique is commonly applied in various types of raster data visualization, such as land surface temperature, elevation, and more, to effectively convey differences in values.
Show code cell source
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
# Set the random seed for reproducibility
np.random.seed(0)
# Generate a random 10x10 array with values between 0 and 255
X = np.random.randint(256, size=(10, 10))
# Create the figure and axes objects with a specified size
fig, ax = plt.subplots(figsize=(6, 6))
# Create a heatmap using seaborn with annotated values and a specified color map
sns.heatmap(X, annot=True, fmt="d", cmap='Spectral', cbar_kws={'label': 'Color Intensity', 'fraction': 0.046}, ax=ax)
# Customize the colorbar
cbar = ax.collections[0].colorbar
cbar.set_label('Color Intensity', rotation=270, labelpad=20, fontsize=16)
cbar.ax.tick_params(labelsize=16)
# Add a title to the heatmap
ax.set_title("Heatmap with Annotated Pixels", fontsize=15)
# Disable grid lines
ax.grid(False)
# Ensure the heatmap cells are square and adjust the layout
ax.set_aspect('equal')
plt.tight_layout()
While the heatmap offers a clear visualization of the data, assigning discrete values to each pixel that could represent different land cover categories, it’s important to remember that not all raster data formats are compatible with attribute tables. In many GIS applications, raster data are utilized without the accompaniment of attribute tables, relying solely on the inherent pixel values to convey the necessary spatial information. This approach underscores the versatility and adaptability of raster data in various GIS applications, despite the potential limitations posed by the absence of attribute tables in certain data formats.
Example:
To demonstrate a practical application of raster data visualization, let’s take the MODIS Land Cover Type Product (MCD12Q1 - v061) as an example. By employing Python and Geemap, we can generate a map and overlay a layer to display the land cover data. This is achieved using a predefined color palette that aligns with the International Geosphere-Biosphere Programme (IGBP) land cover classification system.
Fig. 2.4 provides a visual representation of land cover types in the area surrounding Calgary, Alberta, Canada, based on the MODIS Land Cover Type data for the year 2013. The map is color-coded according to the International Geosphere-Biosphere Programme (IGBP) land cover classification scheme, which includes various categories such as forests, shrublands, wetlands, and urban areas. Each category is assigned a specific color, making it easy to identify different land cover types at a glance. The map also features a legend that correlates the colors with the land cover categories, aiding in interpretation. This visualization serves as a valuable tool for understanding the distribution and extent of different ecosystems and land uses in the region.
Fig. 2.4 This map illustrates the diverse land cover types in the vicinity of Calgary, using the MODIS Land Cover Type data from 2013. The color-coded representation reflects the IGBP classification, providing insights into the region’s ecological diversity, from urban areas to natural vegetation and water bodies.#
2.3.5. Measurement Levels#
Attributes in GIS are categorized into four measurement levels, each with distinct characteristics:
Nominal Data: These are categorical data without any numeric significance or order. For example, land use types such as residential, commercial, and industrial are nominal data.
Ordinal Data: This data type has a ranked order but no fixed interval between ranks. A soil erosion risk map might classify areas as low, moderate, or high risk, which are ordinal data.
Interval Data: Numeric data with equal intervals but no true zero point. Temperature scales like Celsius and Fahrenheit are interval data because the difference between degrees is the same, but there is no absolute zero.
Ratio Data: Similar to interval data but with a meaningful zero point, allowing for the comparison of relative magnitudes. Examples include population counts and annual rainfall measurements, where zero represents none or no occurrence.